{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook provides an interactive overview to some if the ideas developed in a [paper](http://www.phil.vt.edu/dmayo/personal_website/Error_Statistics_2011.pdf) entitled \"Error Statistics\", by Mayo and Spanos.\n",
    "\n",
    "#Goals\n",
    "\n",
    "For a sample of data:\n",
    "\n",
    "(1) quantify the extent to which the sample is consistent with coming from a particular, hypothetical data source\n",
    "\n",
    "(2) if inconsistent, determine what _other_, particular data sources is the sample consistent with.\n",
    "\n",
    "\n",
    "# Introduction\n",
    "\n",
    "Two notions of probability:\n",
    "\n",
    "Frequentist: probabilities represent relative frequency of occurence. \n",
    "e.g. $P(X;\\mu)$ speaks to the probability of outcome $X$, given $\\mu$.\n",
    "\n",
    "Bayesian: probabilities represent degrees of belief\n",
    "e.g. $P(\\mu;X)$ speaks to the probability of $\\mu$ being true, given $X$.\n",
    "\n",
    "Both are useful. \n",
    "\n",
    "Bayesian analyses:\n",
    "* incorporate prior knowledge\n",
    "* produce posterior probabilities\n",
    "\n",
    "Freqentist analyses:\n",
    "* allow for lack of prior knowledge\n",
    "* do not produce posterior probabilities\n",
    "\n",
    "\n",
    "Let's explore the use of frequentist statistics. $P(X;\\mu)$ describes a set of probabilities for observed data $X$ given an assumption about the world, parameterized by $\\mu$. Specifically, let's look at the statistics of errors of inference."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exploration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from scipy.stats import norm # properties of the distribution\n",
    "from numpy.random import normal # samples from the distribution\n",
    "import numpy as np\n",
    "import scipy\n",
    "from matplotlib import pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All hypotheses discussed herein will be expressed with Gaussian / normal distributions. Let's look at the properties of this distribution.\n",
    "\n",
    "Start by plotting it. We'll set the mean to 0 and the width the 1...the _standard normal_ distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "x = np.arange(-10, 10, 0.001)\n",
    "plt.plot(x,norm.pdf(x,0,1)) # final arguments are mean and width"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now look at the cumulative distribution function of the standard normal, which integrates from negative infinity up to the function argument, on a unit-normalized distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "norm.cdf(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The function also accepts a list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "norm.cdf([-1., 0, 1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's be more explicit about the parameters of the distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "mu = 0\n",
    "sigma = 1\n",
    "norm(loc=mu, scale=sigma)\n",
    "norm.cdf([-1., 0, 1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "sigma=2\n",
    "mu = 0\n",
    "n = norm(loc=mu, scale=sigma)\n",
    "n.cdf([-1., 0, 1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition to exploring properties of the exact function, we can sample points from it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "[normal() for _ in range(5)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also approximate the exact distribution by sampling a large number of points from it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "size = 1000000\n",
    "num_bins = 300\n",
    "\n",
    "plt.hist([normal() for _ in range(size)],num_bins)\n",
    "plt.xlim([-10,10])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# Data samples\n",
    "\n",
    "If we have sample of points, we can summarize them in a model-nonspecific way by calculating the mean.\n",
    "\n",
    "Here, we draw them from a Gaussian for convenience."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "n = 10\n",
    "my_sample = [normal() for _ in range(n)]\n",
    "my_sample_mean = np.mean(my_sample)\n",
    "print(my_sample_mean)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Now let's generate a large number of data samples and plot the corresponding distribution of sample means."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "n = 10\n",
    "means_10 = []\n",
    "for _ in range(10000):\n",
    "    my_sample = [normal() for _ in range(n)]\n",
    "    my_sample_mean = np.mean(my_sample)\n",
    "    means_10.append(my_sample_mean)\n",
    "\n",
    "    \n",
    "plt.hist(means_10,100)\n",
    "plt.xlim([-1.5,1.5])\n",
    "plt.xlabel(\"P(mean(X))\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "n = 100\n",
    "means_100 = []\n",
    "for _ in range(10000):\n",
    "    my_sample = [normal() for _ in range(n)]\n",
    "    my_sample_mean = np.mean(my_sample)\n",
    "    means_100.append(my_sample_mean)\n",
    "    \n",
    "plt.hist(means_100,100)\n",
    "plt.xlim([-1.5,1.5])\n",
    "plt.xlabel(\"P(mean(X))\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# show 1/sqrt(n) scaling of deviation\n",
    "n_s = []\n",
    "std_100 = []\n",
    "for i in range(1, 1000, 50):\n",
    "    means_100 = []\n",
    "    for _ in range(5000):\n",
    "        my_sample = [normal() for _ in range(i)]\n",
    "        my_sample_mean = np.mean(my_sample)\n",
    "        means_100.append(my_sample_mean)\n",
    "    my_sample_std = np.std(means_100)\n",
    "    std_100.append(1./(my_sample_std*my_sample_std))\n",
    "    n_s.append(i)\n",
    "    \n",
    "plt.scatter(n_s,std_100)\n",
    "plt.xlim([0,1000])\n",
    "plt.ylabel(\"std(mean(X;sample))\")\n",
    "plt.xlabel(\"sample\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that by increasing the number of data points, the variation on the mean decreases.\n",
    "\n",
    "\n",
    "Notation: the variable containing all possible n-sized sets of samples is called $X$. A specific $X$, like the one actually observed in an experiment, is called $X_0$. \n",
    "\n",
    "## What can we say about the data?\n",
    "\n",
    "* are the data consistent with having been sampled from a certain distribution?\n",
    "* if not, what distribution _are_ they consistent with?\n",
    "\n",
    "# Hypotheses\n",
    "\n",
    "In our tutorial, a hypothesis is expressed as a distribution from which the data may have been drawn. Our goal is to provide a procedure for rejection of the null hypothesis, and, in the case of rejecting the null, provide warranted inference of one or more alternate hypotheses.\n",
    "\n",
    "**Simplification**: the hypothesis space is defined as all normal distributions with variable mean $\\mu$ and fixed variance. Generalizing this assumption changes almost nothing.\n",
    "\n",
    "**Corollary**: the hypothesis space is one-dimensional, and the logical _not_ of a hypothesis is simple to comprehend.\n",
    "\n",
    "## A Test Statistic\n",
    "\n",
    "To relate observed data to hypotheses, we need to define a _test statistic_, which summarizes a particular experimental result. This statistic is also a function of the hypothesis, and will have different sampling distributions under different hypotheses.\n",
    "\n",
    "$d(X;H_\\mu) = (\\bar X - \\mu)/(\\sigma/\\sqrt n)$,\n",
    "\n",
    "where $\\bar X$ is the mean of $X$. For Gaussian hypotheses, $d$ is distributed as a unit normal. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def d(X=[0], mu = 0, sigma = 1):\n",
    "    X_bar = np.mean(X)\n",
    "    return (X_bar - mu) / sigma * np.sqrt(len(X))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "n = 10\n",
    "my_sample = [normal() for _ in range(n)]\n",
    "d(my_sample)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's numerically determine the sampling distribution under the hypothesis: $H_0$: $\\mu = 0, \\sigma = 1$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "size = 100000\n",
    "n = 10\n",
    "d_sample = []\n",
    "\n",
    "for _ in range(size):\n",
    "    my_sample = [normal() for _ in range(n)] # get a sample of size n\n",
    "    d_sample.append(d(my_sample)) # add test statistic for this sample to the list\n",
    "plt.hist(d_sample,100)\n",
    "plt.xlabel(\"P(d(X);H0)\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With this sampling distribution (which can be calculated exactly), we know exectly how likely a particular result $d(X_0)$ is. We also know how likely it is to observe a result that is even less probable than $d(X_0)$,  $P(d(X) > d(X_0); \\mu)$.\n",
    "\n",
    "## Rejecting the null\n",
    "\n",
    "This probability is the famous `p-value`. When the `p-value` for a particular experimental outcome is less that some pre-determined amount (usually called $\\alpha$), we can:\n",
    "\n",
    "* infer that $H_0$ is falsified at level $\\alpha$\n",
    "* take the action that has been specified for this situation\n",
    "* infer that $X_0$ indicates something about an alternate hypothesis. If $H_0$ corresponds to $\\mu = \\mu_0$, then we infer that $\\mu > \\mu_0 + \\delta$.\n",
    "\n",
    "If $H_0$ is rejected, we can now also begin to speak about statistical properties of $H_1$ where $H_1 != H_0$.\n",
    "\n",
    "## Neyman-Pearson digression\n",
    "\n",
    "The traditional frequentist procedure (due to Neyman and Pearson) is to construct a test that fixes the probability of rejecting $H_0$ when it’s true, and maximizes the _power_: the probability of statistical similarity with $H_1$ when it is true. In other words, for a fixed probability of rejecting $H_0$ when it's true, maximize the probability of accepting $H_1$ when it's true.\n",
    "\n",
    "![Alt text](error.jpg)\n",
    "\n",
    "The N-P construction is fixed before $X_0$ is observed. We wish to extend this and, when $H_0$ is rejected, infer regions of alternate parameter space that are _severely tested_ by the outcome $X_0$.\n",
    "\n",
    "## Inference of an alternate hypothesis\n",
    "\n",
    "When the null hypothesis is rejected, we are interested in _ranges of alternate hypotheses that, if not true, are highly likely to have produced a test statistic less significant than $d(X_0)$_. We say these ranges of parameters space, which can be thought of as composite hypotheses, have been _severely_ tested. We call the level of testing _severity_ and it is a function of the observed data ($X_0$), the range of alternate hypothesis ($H$), and the test constuction itself.\n",
    "\n",
    "**This is the point of the tutorial**: we are warrented to infer ranges of hypothesis space when that range has been severely tested."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# look at the distributions of sample means for two hypotheses\n",
    "\n",
    "def make_histograms(mu0=0,mu1=1,num_samples=10000,n=100,sigma=1):\n",
    "\n",
    "    #d0_sample = []\n",
    "    #d1_sample = []\n",
    "\n",
    "    m0_sample = []\n",
    "    m1_sample = []\n",
    "    for _ in range(num_samples):\n",
    "        H0_sample = [normal(loc=mu0,scale=sigma) for _ in range(n)] # get a sample of size n from H0\n",
    "        H1_sample = [normal(loc=mu1,scale=sigma) for _ in range(n)] # get a sample of size n from H1\n",
    "\n",
    "        m0_sample.append( np.mean(H0_sample) ) # add mean for this sample to the m0 list\n",
    "        m1_sample.append( np.mean(H1_sample) ) # add mean for this sample to the m1 list\n",
    "\n",
    "        # remember that the test statistic is unit-normal-distributed for Gaussian hypotheses,\n",
    "        # so these distributions should be identical\n",
    "        #d0_sample.append( d(H0_sample,mu0,sigma) ) # add test statistic for this sample to the d0 list\n",
    "        #d1_sample.append( d(H1_sample,mu1,sigma) ) # add test statistic for this sample to the d1 list\n",
    "\n",
    "    plt.hist(m0_sample,100,label=\"H0\")\n",
    "    plt.hist(m1_sample,100,label=\"H1\")\n",
    "    plt.xlabel(r\"$\\bar{X}$\")\n",
    "    plt.legend()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "num_samples = 10000\n",
    "n = 100\n",
    "\n",
    "mu0 = 0\n",
    "mu1 = 1\n",
    "sigma=2\n",
    "\n",
    "make_histograms(mu0=mu0,mu1=mu1,num_samples=num_samples,n=n,sigma=sigma)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, imagine that we observe $\\bar X_0 = 0.4$. The probability of $\\bar X > 0.4$ is less than $2\\%$ under $H_0$, so let's say we've rejected $H_0$.\n",
    "\n",
    "Question, what regions of $\\mu$ (defined as $\\mu > \\mu_1$) have been severely tested?\n",
    "\n",
    "$SEV(\\mu>\\mu_1) = P(d(X)<d(X_0);!(\\mu>\\mu_1)) = P(d(X)<d(X_0); \\mu<=\\mu_1)$ ---> $P(d(X)<d(X_0);\\mu = \\mu_1)$\n",
    "\n",
    "So we only need to calculate the probability of a result less anomalous than $d(X_0)$, given $\\mu_1$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# severity for the interval: mu > mu_1\n",
    "\n",
    "# note that we calculate the probability in terms of the _lower bound_ of the interval, \n",
    "# since it will provide the _lowest_ severity\n",
    "\n",
    "def severity(mu_1=0, x=[0], mu0=0, sigma=sigma, n=100):\n",
    "    # find the mean of the observed data\n",
    "    x_bar = np.mean(x)\n",
    "    # calculate the test statistic w.r.t. mu_1\n",
    "    dx = (x_bar - mu_1)/sigma*np.sqrt(n)\n",
    "    # the test statistic is distributed as a unit normal\n",
    "    n = norm()\n",
    "    return n.cdf(dx)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calculate the severity of an outcome that is rather unlike (is lower) than the lower bound of a range of alternate hypotheses ($\\mu > \\mu_1$)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "sigma = 2\n",
    "mu_1 = 0.2\n",
    "x = [0.4]\n",
    "severity(mu_1=mu_1,x=x,sigma=sigma)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "num_samples = 10000\n",
    "n = 100\n",
    "\n",
    "mu0 = 0\n",
    "mu1 = 0.2\n",
    "sigma=2\n",
    "\n",
    "make_histograms(mu0=mu0,mu1=mu1,num_samples=num_samples,n=n,sigma=sigma)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calculate the severity for a set of observations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "x_bar_values = [[0.4],[0.6],[1.]]\n",
    "color_indices = [\"b\",\"k\",\"r\"]\n",
    "\n",
    "for x,color_idx in zip(x_bar_values,color_indices):\n",
    "    mu_values = scipy.linspace(0,1,100)\n",
    "    sev = [severity(mu_1=mu_1,x=x,sigma=sigma) for mu_1 in mu_values]\n",
    "    plt.plot(mu_values,sev,color_idx,label=x)\n",
    "    plt.ylim(0,1.1)\n",
    "    plt.ylabel(\"severity for $H: \\mu > \\mu_1$\")\n",
    "    plt.legend(loc=\"lower left\")\n",
    "    plt.xlabel(r\"$\\mu_1$\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Get some intuition by considering some test points:\n",
    "\n",
    "* given the interval $\\mu > \\mu_1 = 0.4$, the probability to observe something less than or equal to $0.4$ (blue line) is $0.5$. The interval is not severely tested and its inference is not warrented.\n",
    "* a result of $\\bar X_0 = 1$ warrents inference of $\\mu \\sim> 0.7$. \n",
    "* warrented intervals of inference are dependant on the observed outcome ($\\bar X$ values)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
